############################
# PAS-seq Pipeline using Bowtie1.0
############################

0. Preprocess
# To remove reads that don't include polyA as well as the barcodes from each read. 
Use "preprocess.py" in myPythonTool folder.

1. Alignment 
module load bowtie/1.0.0
bowtie [mm9_indexed_refGenome] -q [input.fastq] -n 2 -m 1 -S > [output.sam] 2> [output.log]

1.5 Remove internal priming reads


2. Remove duplicates
# Convert SAM file to BED format so that duplicates can be removed using "PCRdup_BEDcombiner.py" but this step is later discarded in "tophat_process".
# To apply "PAS_incrementer.py", the input format is also BED.
convert2bed --input=sam [--output=bed] < [input.sam] > [output.bed]

3. Prepare BAMs
# convert BED file to BAM format so "bamCoverage" can be applied to generate bigwigs but later I switched to "bedGraphtoBigwig". 

module load bedtools/2.23.0
bedtools bedtobam -i [input.bed] -g [genome_reference] > [output.bam]

module load samtools/1.1
samtools sort -T [temp.file] -o [output.sorted.bam] [input.bam]

4. Generate tracks
# generated tracks on forward strand, reverse strand, and combined. 
# bamCoverage is a function in deepTools
bamCoverage -b [input.sorted.bam] --normalizeUsingRPKM --samFlagExclude 16 --ignoreDuplicates --binSize=2 -p 32 -o [output.f.bw]
bamCoverage -b [input.sorted.bam] --normalizeUsingRPKM --samFlagInclude 16 --ignoreDuplicates --binSize=2 -p 32 -o [output.r.bw]
bamCoverage -b [input.sorted.bam] --normalizeUsingRPKM --ignoreDuplicates --binSize=2 -p 32 -o [output.c.bw]


5. Generate read counts table 
# using masterlist — Tian PAS data.txt, as reference, available at:
https://cbcl.ics.uci.edu/public_data/shilab/PASseq_szang_20160302/Tian%20PAS%20data.txt